My Existing Model

New local model that includes district level data on polling, incumbency, and local employment data is much more accurate. R-squared of 0.73 which is the highest so far. This is only when I use for DemMajorVotePct. When I do DemSeats(which my previous models used) as my outcome variable, I get a lower R-squared of 0.50 exactly.

Below is a plot of the actual outcomes from the 2018 election from which im pulling my predictive modeling data from.

Now I have to add the variable of ad spending on a local level. I’ve gone ahead and downloaded data from the FEC for 2018 election spending data. This isn’t exactly the ad spend per campaign but I am using it as a proxy by making the assumption that the more money a particular race / candidate has overall translates to how much it is spending in ads.

Observations 448
Dependent variable DemVotesMajorPercent
Type OLS linear regression
F(4,443) 312.97
0.74
Adj. R² 0.74
Est. S.E. t val. p
(Intercept) 77.96 3.42 22.77 0.00
avg -6.73 0.19 -35.15 0.00
Unemployed_prct 0.35 0.89 0.39 0.70
winner_candidate_incIncumbent 4.96 1.04 4.79 0.00
Receipts -0.00 0.00 -3.02 0.00
Standard errors: OLS